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I. HELICES IN BIOLOGICAL MOLECULES 

Double-stranded DNA is a double helix pj. The prin- 
cipal secondary structures in proteins are a-helices Q 
and /3-sheets [ID}- which are sheets of helices. The fi- 
brous protein a-keratin is a double a-helix; collagen is 
a triple helix. The cytoskeletal filaments — actin fila- 
ments, microtubules, and intermediate filaments — are 
helical assemblies of subunits. Helices occur in the cap- 
sids of viruses. 

Helices are important and ubiquitous in biology be- 
cause identical objects, regularly assembled, form a he- 
lix. This theorem — that a regular assembly of identi- 
cal objects is a helix — has been known in the biolog- 
ical community since the work of Pauling [3- 0- Q but 
is less familiar to physicists. Although it can be de- 
rived from the differential geometry of Lancret and de 
Saint Venant [f|, I am unaware of a proof of it in the 
biological literature, where it often is illustrated by pho- 
tographs of stacks of identical blocks, each rotated by 
a fixed angle about the vertical axis Q- Because of its 
far-reaching implications for biology, a proof that is sim- 
ple, direct, and self-contained should be useful. Such a 
proof is given in Sec. CD In Sec. EH some formulas about 
helices are derived. In Sec. IIVI the theorem and these 
formulas are illustrated by and applied to nucleic acids, 
protein secondary structures, proteins, protein folding, 
and viral capsids. The theorem implies, in particular, 
that the /3-strand 0, Q , which is the second most com- 
mon secondary structure in proteins, is a helix, and that 
icosahcdral viral capsids are made of helices. The paper 
ends with remarks about helices and evolution. 



II. REGULARITY IMPLIES HELICITY 

Suppose we have a collection of identical objects, which 
we label with the integers. Suppose each object has both 
a socket and a knob. Suppose that every knob can fit 
snuggly into every socket and that, once seated, no fur- 
ther rotation of the knob in the socket is possible. We 
can set the knob of object 1 into the socket of object 2. 
Then we can put the knob of object 2 into the socket 



of object 3. Next, we can put the knob of object 3 into 
the socket of object 4. If wc continue in this way, then 
the chain of objects will form a helix defined by the first 
three objects. 

To see why, we fix our attention on a selected point, 
the same for all the objects. We might choose the top 
of each socket. Let's call the selected point on the ith 
object Pi. Let a = P2 — pi, so p2 = a + pi. The knob 
of each object protrudes from its object in a way that is 
arbitrary but the same for all our objects. So the vector 
b = p.3 p2 has the same length as a and is related to it 
by a 3 x 3 rotation matrix R, b = Ra. The rotation R 
includes any extra rotation of <p radians about the b axis 
that may occur in the rigid motion that attaches object 2 
to object 1. So P3 = b + p2 = i?a + a + pi. What about 
c = P4 — P3? Well, the lengths of all the vectors pj+i — p; 
are the same, and they are all related by rotations. And 
since the objects are all identical, the vector c must be 
related to b by the rotation R in the rotated frame - 
that is, c = RRR -1 b = Rh. So c = R 2 a, and thus the 
point p4 is given by p4 = c + p 3 = R 2 a + Ra + a + pi. 
The general rule is 

n 

Pn+2=J2 Rksi+Pl - W 

Every rotation matrix R has one real eigenvector n 
with an eigenvalue of unity i?n = n.The eigenvector n 
is the axis of the rotation. The caret means that the 
axis n is normalized, and we fix its sign by requiring that 
an > 0. (The other two eigenvectors e± are complex 
with unimodular eigenvalues R e± = e ±l9 e± in which 9 
is the angle of the rotation R.) 

Let us adopt a coordinate system in which the z axis 
is the axis of rotation z = n, and the vector a lies in the 
x — z plane a = a x x + a z z. The rotation now is about 
the z axis, and so 

R a = R (a x x + a z z) = a x (cos 9 x + sin 9 y) + a z z (2) 

in which we choose to have — tt < 9 < ir. If the product 
9 a z is positive, then the helix is right handed; if it is 
negative, then the helix is left handed. The kth power of 
R turns a into 
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R k a = R k (a x x + a z z) = a x (cos k9x + sin k9 y) + a z z. 

(3) 
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So formula (JTJ for the point p n +2 gives 
Pn+2 = (n+l)o z z+px+a x / J (cos fcflx + sinfcfly) . (4) 



fe=0 



Now by expressing sin fc0 and cos fc0 in terms of exp(i0) 
and by using the relation (z — 1) J2k=o zk = — lj 

one may derive the trigonometric identities 

n 

cos fc0 = 5 (cos n(9 + cot ^0 sin n6 + l) (5) 



fc=0 



and 



fe=0 



sinfc0 = | [sin 710 — cot i0 (cos ti0 — 1) 



(6) 



By substituting these identities into Eq.Q), we find 

Pn+2 = (n + l)a z z + pi 

+ \ a x (cos n0 + cot ^0 sin 770 + l) x 

+ \a x [sin n0 — cot \B{cosnd — 1)] y. (7) 

If we call v the vector 




(8) 



then we may write the general point p„+2 as 

Pn+2 = R n v + na z z + pi + a v (9) 

which clearly is a helix. 

A rotation R about a point po takes a point p into the 
point p' given by 



p' po = R (p po) 



(10) 



By comparing this rule with our formula for p n +2, 
we may infer that v = o x x — ro in which ro is the point 
where the axis z crosses the x — y plane; cquivalently 

r = a x x- v = \a x [x + cot(0/2) y ] . (11) 

Equation for the point p„+2 now takes the form 

Pn+2 - ro = R n (dx^ - r ) + (n + l)a z z + p 1 (12) 

or more simply 

Pn+2 - r = R n (a- r ) + na z z + pi (13) 

since i?"z = z. 

This helix rises by Az = a z with each object and 
turns by the angle with each object, so its pitch is 
p = (2ir/d) Az = 2-na z jQ. Its axis is ro + zz for all z. 

The rotation matrix R is the product of a rotation 
R(h, a) that rotates the vector a into the vector b and a 
rotation R(tph) about the vector b by a dihedral angle <j> 



The first matrix i?(b, a) is 

i?(b,a) = |a^Tb){a~xb| + |b)(a| 

+ | (a xbj~x b)((a xbjx a| (15) 

in Dirac notation with the carets meaning that all the 
vectors are unit vectors. The second matrix R(<f>h) is 



R(ct>b) 



,tj,h-L 



cos (j) I + b • L sin <j) + (1 — cos 4>) b(b) T 

(16) 

in which the generators (Lk)ij = £ikj satisfy [L^,Lj] = 
SijkLk and T means transpose. In terms of indices, this 
formula for R(</>b) = e 0fc £ is 

R(4>b)ij = Sij cos <f>- sin <f)e ijk bk + (l- cos (f>)bibj. (17) 

In these formulas, e^fe is totally antisymmetric with 
£i23 = I, and sums over k from 1 to 3 are understood. 



III. PARAMETRIZING A HELIX 

Suppose you are given a set of points that lie on a 
helix. How do you find the spacing Az, the angle per 
step, the axis n, and a point no on the helix? A helix is 
defined by four points pi, P2, P3, P4. Let a = p 2 pi, 
b = p 3 p 2 , and c = P4 p 3 . If the axis of the helix 
points in the direction n and no is any point on the axis, 
then the axis contains the points no + z n, where z is any 
real number. 

The points pi of the helix are evenly spaced by Az in 
the n direction. The spacing Az is given by 



Az = n • (p 2 



Pi 



(18) 



Because it is constant, the spacing Az is also given by 
Az = n-(p3— P2) = n-b and by Az = n-(p4 — pa) = n-c. 
Thus the axis n is orthogonal to b — a and to c — b. So 
it must be parallel (or antiparallel) to the cross product 
n of these vectors, 



n = (b — a) x (c — b). 



(19) 



In terms of the length £ = |b — a| = |c — b| and the angle 
4> between the vectors b — a and c — b, the vector n is 
of length £ 2 sm(j). The general direction of the helix is 
defined by the difference P4 — Pi = a + b + c; so if a is 
the sign of the dot product (a + b + c) • n, then the axis 
of the helix is the unit vector 



n 



(b - a) x (c - b) 



(20) 



The three other parameters of the helix are its radius 
p, its angle 0, and a point no on its axis. To find these, 
we note that for each of the four points p^, 



R = R((/)b)R(b,a.) 



(14) 



;n x (p. t - n )]" 



(21) 
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Subtracting this relation for i = 1 from this relation for 
i = 2 and recalling that a = p 2 — Pi , we get an equation 
that is linear in no 

2 (fi x a) • (fi x n ) = (fi x p 2 ) 2 - (fix Pi) 2 . (22) 

Similarly, subtracting Eq.JSIJ for i = 2 from Ea. (|21|) for 

z = 3 and recalling that b = P3 — p 2 , we find 

2 (fi x b) • (fi x no) = (fi x p 3 ) 2 - (fix p 2 ) 2 . (23) 
An orthonormal basis is provided by the three vectors 
b a 



ei 



§2 = A x ex j and e 3 = fi. (24) 



|b-a| 

Subtracting Eq. (J22J irom Eq.J^, we get 

2 [fi x (b — a)] • (fi x n ) = 

(fixp 3 ) 2 -2(fixp 2 ) 2 + (fix Pl ) 2 (25) 

or, using 

2 |b — a| (fi x §x) • (fi x n ) = 

(nxp 3 ) 2 -2(nxp 2 ) 2 + (nx Pl ) 2 . (26) 

So in terms of the definition 

n (n x p 3 ) 2 - 2 (fi x p 2 ) 2 + (fi x Pi ) 2 

° a21 2\b^\ ' ( } 

we may use ()24|l again to write Ea. (|26ll as 

e 2 • (n x n ) = C 32 x. (28) 

Since the unit vectors &i are complete and orthonor- 
mal, we may expand the axis point n as 



n = ' n o) g i- 



(29) 



Using (|24|) and the relations nxe 2 = —ex and fixe 3 = 0, 

we have 

n x n = (ex • n )e 2 - (e 2 • n )ex. (30) 
So Ea. l|28|) now implies 

ex-n = C 3 2x. (31) 
Expanding the vector a in terms of the basis {e^} 



i=X 



(32) 



and using i|24|) . we find 

fi x a = (ex • a) e 2 — (e 2 • a) ex- (33) 
This relation and Ea. (|30|) imply 
(fixa)-(fixn ) = (ex-a) (§1 -n ) + (e 2 -a) (e 2 -n ). (34) 



Using this expression and the notation 

C 21 = \ [(fi x p 2 ) 2 - (fi x Pl ) 2 ] , (35) 

we extract from Ea. (|22|l the result 

(ex • a) (ex • n ) + (e 2 • a) (e 2 • n ) = C 21 (36) 
or, using (|3"TJl . 

C 2 i - (ex • a) C 321 



e 2 • n 



e 2 • a 



(37) 



The inner product e 3 • no is arbitrary. So by substituting 
our formulas (|31|l for §1 • no and (|37|l for e 2 ■ no into the 
expansion l|29|) , we have a set of points no on the axis of 
the helix in terms of the free parameter e 3 • fio. 

To find the radius p from Ea. (|21|l . we use Ea. (|30|) for 
the cross product n x n : 



p = y [n x px + (e 2 • n ) ex - (ex • n ) e 2 ] , (38) 

where the axis fi and the inner products e 2 • no and ex • no 
are given respectively by i|2U|) . (|3*T|l . and 

The cosine of the angle 9 is 

cos6> = p~ 2 [fi x (pi - n )] • [fi x (p 2 - n )], (39) 

and its sine is 

sine = /9~ 2 fi- [fi x (px - n )] x [fi x (p 2 - n )]. (40) 

So the angle 9 is the argument of the complex num- 
ber (cos 9, sin 9) in the interval — 7r < 9 < 7r, which 
is given by the FORTRAN arctangent function atan2 as 
9 = atan2(sin 9, cos 9) . 

IV. EXAMPLES OF BIO-HELICES 

DNA: Although DNA is made out of nucleotides, its 
building block is the object dR-B-B'-dR in which dR is a 
deoxyribose sugar and B-B' is a Crick- Watson base pair 
of adenine and thymine (A=T or T=A) or of cytosine 
and guanine (C=G or G=C). The four base pairs have 
nearly the same size, and so the four units dR-B-B'-dR 
are nearly identical. Phosphate groups glue these units 
into a regular chain. Each dR is linked by one phosphate 
group to the unit behind it and by another phosphate 
group to the unit ahead of it. This pattern of covalent 
bonds is nearly the same in all dR-B-B'-dR units. The 
result is a helix or a double helix [l| if one counts both 
chains of sugar-phosphate groups. For instance, the ideal 
B-DNA dodecamer d(CGCGAATTCGCG) at 1.4 A reso- 
lution is a right-handed double helix with a z = Az = 3.3 
A, 9 = 35.5°, a diameter of 20 A, and a pitch of 33.3 
A But other sequences of base pairs have 9 as low 
as 26° or as high as 43°. When the relative humidity 
is below 75%, B-DNA turns into the A form, which is a 
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right-handed helix with a pitch of 34 A, but with 9 = 26°. 
The Z form, which can occur when the salt concentration 
is high, is a left-handed helix with 8 — 18° and a pitch 
of 44 A. The dodecamer d((AT) 6 ) for ms coiled coils M. 

Secondary Structures of Proteins: Proteins are 
chains of amino acids. Except for proline, the 20 amino 
acids differ only in their side chains. The amino acids 
all have the same main chain N— C— C and are linked to- 
gether N-C-C^N-C-C ^N-C-C by peptide bonds, 
which resist rotations — the angle u> about the C— N 
bond usually is close to 180°. The dihedral angles </> and 
tp describe rotations about the axes of the single bonds 
N— C and C— C. These angles are the principal degrees 
of freedom in proteins, but they are far from free. Ra- 
machandran steric constraints force them to lie in three 
regions, more (proline) or less (glycine). 

The a reg ion lies near <f> — —57°, V — —47°, and 
lu = 180° [H|. A chain of amino acids with these dihedral 
angles is an a-helix Q. By using Eqs. (|18H40|) . one may 
show that the ideal a-helix is right handed and that it 
has 3.62 residues per turn, 9 = 99.4°, a z — Az — 1.56 A, 
and a pitch of 5.64 A. This geometry allows the carboxyl 
oxygen of the ith amino acid to flirt with the hydrogen 
of the main-chain nitrogen of the i + 4th amino acid; the 
energy of the resulting Ni + 4— H • • • 0=Ci hydrogen bond 
is of the order of 0.3 cV. 

The other two sterically allowed regions are side by 
side. The more important one, near (f> = —139°, ip = 
135°, and u> = —178° ^lj, generates helices that form 
hydrogen bonds between their main-chain amino and car- 
boxyl groups when the helices are adjacent and antiparal- 
lel, forming an antiparallel /3-sheet. Formulas <|18H40ll im- 
ply that the ideal antiparallel /3-helix has 2.004 residues 
per turn, a z = Az = 3.47 A, and a pitch of 6.95 A. 
Although slightly left handed with 9 = -179.7°, it is 
nearly planar. Changes of (/>, ijj, and w by 1° flip the an- 
gle 9 across the cut at 8 = ir; so antiparallel /3-helices do 
not have definite hclicity. 

The other region, near = —119°, -0 = 113°, and lu = 
180° ^3 j generates helices that form hydrogen bonds be- 
tween their main-chain amino and carboxyl groups when 
the helices are adjacent and parallel — a parallel /3-sheet. 
The ideal parallel /3-helix has 2.024 residues per turn, 
a z = Az = 3.27 A, and a pitch of 6.62 A. Although 
somewhat right handed with 9 = 177.8°, it is nearly pla- 
nar. Changes of (f>, ifj, and w by 3 or 4° can flip the angle 
9 across the cut at 9 = ir, and so parallel /3-hclices do 
not have definite hclicity. In a parallel /3-sheet, the dis- 
tance along its main chain between an amino group and 
the carboxyl group to which it hydrogen-bonds is greater 
than in an antiparallel /3-sheet (or an a-helix), and so 
proteins with parallel /3-shccts fold slowly. 

Students would get a more unified view of secondary 
structure in proteins and nucleic acids if authors of bio- 
chemistry textbooks called /3-strands "/3-helices." 

Proteins: The main fibrous protein in hair, horn, and 



nails, a-kcratin, is two a-helices wrapped around each 
other in a left-handed double helix. The key protein of 
the extracellular matrix holding cells in animal tissue, 
collagen, is three a-helices in a right-handed triple helix. 
Actin and tubulin form helical cytoskclctal filaments. 

Globular and transmembrane proteins are a- and /3- 
helices linked by loops and turns. They can be as dense 
as crystals; a-helices pack closely fl3| . 

A Remark about Protein Folding: Proline aside, 
any string of amino acids can fold into an a or a /3-helix. 
How does it decide? The solvent helps it decide. Proteins 
fold in salty water, a polar solvent. A protein in a polar 
solvent has a lower energy if the hydrophilic (charged or 
polar) side chains are on the outside and the hydropho- 
bic ones are inside. Suppose two hydrophilic side chains 
are separated on the main chain by n hydrophobic ones. 
They will form a helix that cuts through the protein on a 
chord of length n Az with the two hydrophilic ones out- 
side at its ends. But Azp = 3.4 A for a /3-helix is twice 
Az a = 1.6 A for an a- helix. So the choice between the 
two kinds of helices is decided in part by whether n Az a 
or n Azp is closer to the thickness of the protein. 

Viral Capsids The coats (or capsids) of filamentary 
viruses often are made of a single helix. For instance, 
the helical capsid of the tobacco mosaic virus consists of 
2130 copies of a single protein [l-i. 

The coats of icosahedral viruses [l5j are made of nested 
helices in which T = h 2 + hk + k 2 = 1, 3, 4, 7, ... 
protein molecules form an "asymmetric unit." Three of 
these asymmetric units form a triangular, primary helix 
of zero pitch. In turn, ten of these primary, triangular he- 
lices form two pentagonal, zero-pitch, secondary helices, 
and another ten of them form a secondary, beltlike, zero- 
pitch deca-helix. An icosahedron results when the two 
secondary pentagonal helices attach to opposite sides of 
the secondary belt of ten triangular helices. The capsid 
is made of 60T protein molecules. 

Why all the helices? Evolution, size, and geometry 
require them. Evolution forces economy. Cells use mass 
production to achieve economy, making many copies of 
identical or closely related objects. Cells are too small to 
have workers, so they use macromolccules. DNA poly- 
merase makes DNA; RNA polymerase makes RNA; ribo- 
somes make proteins; proteins fold automatically or via 
chaperones; protein complexes self-assemble. These pro- 
cesses assemble nearly identical objects in regular ways. 
Helices are ubiquitous because identical objects, regu- 
larly assembled, form helices. 
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